Task 1: Download NYC City Council District Boundaries
To launch this project, it is essential to develop a high-performance map of NYC’s Council Districts. Transforming the city’s raw boundary data into a fast, universally compatible WGS84 layer facilitates powering all visualizations and spatial joins, making it possible to link every tree to its specific district. This includes calculating the square mileage of each district, a key step for enabling fair comparisons.
Show code
#| label: task1-download-districts#| cache: true#| code-fold: true#| code-summary: "Task 1: download → unzip → st_read → WGS84 → simplify → id + area"download_nyc_districts_mp03 <-function(zip_url ="https://s-media.nyc.gov/agencies/dcp/assets/files/zip/data-tools/bytes/city-council/nycc_25c.zip",data_dir ="data/mp03",simplify =TRUE,dTolerance =8){ fs::dir_create(data_dir)# 1) Download if needed zip_file <-file.path(data_dir, "nycc_districts.zip")if (!fs::file_exists(zip_file)) {download.file(zip_url, zip_file, mode ="wb", quiet =TRUE) }# 2) Unzip if needed unzip_dir <-file.path(data_dir, tools::file_path_sans_ext(basename(zip_url)))if (!fs::dir_exists(unzip_dir)) { utils::unzip(zip_file, exdir = unzip_dir) }# 3) Read shapefile shp <-list.files(unzip_dir, pattern ="\\.shp$", full.names =TRUE, recursive =TRUE)if (length(shp) ==0) stop("No .shp found in: ", unzip_dir) d <-st_read(shp[1], quiet =TRUE)# 4) Transform to WGS84 d <-st_transform(d, crs ="WGS84")# 5) Simplify for speedif (isTRUE(simplify)) {if (rlang::is_installed("rmapshaper")) { d$geometry <- rmapshaper::ms_simplify(d$geometry, keep =0.10, keep_shapes =TRUE) } else { d$geometry <-st_simplify(d$geometry, dTolerance = dTolerance) } }# 6) Clean names, fix id, and compute area (m² and km²) d <- janitor::clean_names(d) id_col <-intersect(c("coundist","council","council_district","coun_dist","district"), names(d))[1]if (is.na(id_col)) stop("Could not find council district column in shapefile.") d <-rename(d, council_district =!!rlang::sym(id_col)) d$council_district <-as.integer(as.character(d$council_district))# Robust area in an equal-area CRS; store BOTH fields for later use area_m2 <- sf::st_area(sf::st_transform(d, 5070)) d$Shape_Area <-as.numeric(units::set_units(area_m2, m^2)) d$area_km2 <- d$Shape_Area /1e6 d}# -> Primary datasetnyc_districts <-download_nyc_districts_mp03()
Figure 1 provides an interactive view of NYC’s 51 council districts, enabling users to explore district boundaries, borough affiliations, and land areas through hover interactions. Also, each council district is rendered in a distinguishing color to enhance visual distinction. Most importantly, this map serves as the foundation for district-level comparisons throughout the project.
Show code
#| label: task1-districts-colorful#| fig.width: 9#| fig.height: 7#| message: false#| warning: false#| code-fold: true#| code-summary: "Interactive: hover shows District • Borough • Area (sq mi)"# 1) Compute area in *square miles* from a feet-based projection (EPSG:2263)dist_2263 <- sf::st_transform(nyc_districts, 2263)# area in ft^2 -> mi^2 (5280 ft in a mile)area_mi2 <-as.numeric(sf::st_area(dist_2263)) / (5280^2)# 2) Build hover fields (note the 'sq mi' units in the label)nyc_districts_hover <- nyc_districts |> dplyr::mutate(borough = dplyr::case_when( council_district >=1& council_district <=10~"Manhattan", council_district >=11& council_district <=18~"Bronx", council_district >=19& council_district <=32~"Queens", council_district >=33& council_district <=48~"Brooklyn", council_district >=49& council_district <=51~"Staten Island",TRUE~NA_character_ ),area_mi2 = area_mi2,tooltip_text = glue::glue("District: {council_district}\n","Borough: {borough %||% '—'}\n","Area: {scales::comma(area_mi2, accuracy = 0.01)} sq mi" ) )# 3) Interactive map (legend hidden to force visitors to hover supplies details)p_districts <-ggplot() + ggiraph::geom_sf_interactive(data = nyc_districts_hover,aes(fill =factor(council_district),tooltip = tooltip_text,data_id = council_district),color ="black", linewidth =0.6 ) +scale_fill_hue(l =70, c =80, h =c(0, 330), guide ="none") +coord_sf(datum =NA) +labs(title ="Figure 1: NYC Council Districts (Interactive)",subtitle ="Hover for District, Borough, and Area (square miles)" ) +theme_map_min()ggiraph::girafe(ggobj = p_districts,options =list( ggiraph::opts_hover(css ="fill-opacity:0.95; stroke:#111; stroke-width:1.5px;"), ggiraph::opts_tooltip(css =paste("background:rgba(0,0,0,0.92); color:#fff;","padding:12px 14px; border-radius:10px;","font-size:16px; line-height:1.35;","max-width:420px; white-space:pre-line;" )), ggiraph::opts_sizing(rescale =TRUE) ))
With district boundaries established, the subsequent section highlights NYC trees data, showing how downloading and processing the individual tree records that populate this project’s geographic framework.
Task 2: Download Tree Points
NYC’s 2015 Street Tree Census contains detailed records for every street tree in the city. Retrieving this data through the NYC OpenData Socrata API in GeoJSON format preserves each tree’s precise GPS coordinates along with characteristics such as species, health status, and trunk diameter.
To responsibly download the complete dataset of over 680,000 trees, the following processes were undertaken:
Page through the API: in batches of 50,000 records (the maximum allowed per request)
Cache each page locally: as individual files to avoid re-downloading if the analysis is re-run
Combine all pages: into a single spatial dataset (an sf object with POINT geometry)
Store fast-access copies: as both .rds and .gpkg files for future renders
This approach alleviates straining the data provider’s server capacity while ensuring reproducible analysis. The final dataset contains over one million trees with complete spatial and attribute information.
#| label: task2-spatial-join#| cache: true#| message: false#| warning: false#| code-fold: true#| code-summary: "JOIN trees to districts + borough + areas (km² & mi²)"# Ensure the council district id exists and is integerif (!"council_district"%in%names(nyc_districts)) {cand <-intersect(names(nyc_districts),c("CounDist","coun_dist","council_district","c_district","coundist","council","district"))stopifnot(length(cand) >0)nyc_districts <- nyc_districts |>dplyr::rename(council_district =!!rlang::sym(cand[1])) |>dplyr::mutate(council_district =as.integer(as.character(council_district)))}# Compute district areas if missing (accurate geodesic-ish area via projected meters)if (!all(c("area_km2","area_mi2") %in%names(nyc_districts))) {nyc_districts <- nyc_districts |>dplyr::mutate(.geom_m = sf::st_transform(geometry, 3857), # metersarea_km2 = units::set_units(sf::st_area(.geom_m), km^2) |> units::drop_units(),area_mi2 = area_km2 *0.3861021585) |>dplyr::select(-.geom_m)}# Add borough labels if missingif (!"borough"%in%names(nyc_districts)) {nyc_districts <- nyc_districts |>dplyr::mutate(borough = dplyr::case_when(council_district %in%1:10~"Manhattan",council_district %in%11:18~"Bronx",council_district %in%19:32~"Queens",council_district %in%33:48~"Brooklyn",council_district %in%49:51~"Staten Island",TRUE~NA_character_))}# Align CRS if (sf::st_crs(nyc_trees) != sf::st_crs(nyc_districts)) {nyc_trees <- sf::st_transform(nyc_trees, sf::st_crs(nyc_districts))}# Points to Polygon overlaytrees_with_districts <- sf::st_join(nyc_trees, nyc_districts, join = sf::st_intersects,left =TRUE)# A geometry-free copy for table worktrees_with_districts_clean <- trees_with_districts |> sf::st_drop_geometry()# Table to ensure functioningassign_check <- tibble::tibble(total_trees =nrow(trees_with_districts),assigned =sum(!is.na(trees_with_districts$council_district)),unassigned = total_trees - assigned,assign_rate = assigned / total_trees)
Note: This preliminary join validates the workflow structure. The definitive district assignments occur in Task 4 using stricter geometric criteria.
Task 3 Plot All Tree Points
When working with a large volume of spatial points, developing multiple visualizations enhances research efforts. This current section contains several views of the data to support critical analysis. Specifically:
Verify data quality: Confirms that tree points align with district boundaries
A raw points map shows actual tree locations
Identify spatial patterns: Spot concentrations, gaps, and anomalies
The choropleth map aggregating density by district
Test visualization techniques: Determine which approaches best communicate density
Including a hex-binned density mapprovides cell-level granularity.
These unique visualizations validate the accuracy of the spatial joins needed to address the questions in Task 4, ensuring that each tree point is correctly matched to its council district.
Figure 1 reveals NYC’s urban canopy distribution across all 51 council districts. Each green dot represents a street tree, with over one million trees displayed as a random sample of 200,000 for visual clarity. The most striking pattern, though predictable given NYC’s development history, is the drastic inequality in tree density. Residential neighborhoods like Bayside, Greenwich Village, and Riverdale show dense green clusters, while industrial areas appear as “tree deserts.” This raw point data provides the geographic foundation for the district-level analyses that follow, ensuring each tree point is correctly matched to its council district.
Show code
#| label: task3-allpoints#| fig.width: 10#| fig.height: 8#| cache: true#| code-fold: true#| code-summary: "All trees (sf POINT) over council districts (sf POLYGON)"#| message: false#| warning: false# Show a subset for speedset.seed(42)n_show <-min(200000L, nrow(nyc_trees))nyc_trees_show <-if (nrow(nyc_trees) > n_show) { nyc_trees[sample.int(nrow(nyc_trees), n_show), ]} else { nyc_trees}p_all_points <-ggplot() +# Layer 1: district POLYGONSgeom_sf(data = nyc_districts,fill ="grey98", color ="grey60", linewidth =0.25,inherit.aes =FALSE ) +# Layer 2: tree POINTSgeom_sf(data = nyc_trees_show,color ="#228B22", alpha =0.06, size =0.03, shape =16,inherit.aes =FALSE ) +coord_sf(datum =NA) +labs(title ="Figure 1: NYC Tree Points over Council Districts",subtitle = glue::glue("{scales::comma(nrow(nyc_trees))} trees across 51 districts ","(showing {scales::comma(nrow(nyc_trees_show))})" ),caption ="Data: NYC OpenData (hn5i-inap) & NYC Dept of City Planning (nycc_25c)" ) +theme_minimal(base_size =11) +theme(axis.text =element_blank(),axis.title =element_blank(),panel.grid =element_blank() )p_all_points
Figure 2 highlights district-level tree density measured in trees per square mile. Darker shades show denser urban canopy coverage. Through this aggregated view, borough-level patterns emerge, with Manhattan and certain districts in Brooklyn showing higher densities (e.g., population and infrastructure) than the other districts. This metric also normalizes raw tree counts by land area, allowing for fairer comparisons between districts with varying land areas.
Show code
#| label: task4-density-data#| cache: true#| message: false#| warning: false# Prepare counts (no geometry)counts <- trees_with_districts_clean |> sf::st_drop_geometry() |> dplyr::count(council_district, name ="trees")# Ensure area_mi2 exists (robust fallback)if (!"area_mi2"%in%names(nyc_districts)) { nyc_districts <- nyc_districts |> dplyr::mutate(area_mi2 = units::set_units( sf::st_area(sf::st_transform(geometry, 3857)), mi^2 ) |> units::drop_units() )}# Keep the sf object (nyc_districts) on the left of the joindens_by_dist <- nyc_districts |> dplyr::select(council_district, borough, area_mi2, geometry) |> dplyr::left_join(counts, by ="council_district") |> dplyr::mutate(trees = dplyr::coalesce(trees, 0L),trees_per_mi2 = trees / area_mi2 )
Show code
#| label: task3-density-choropleth-mi2#| fig.width: 9#| fig.height: 7#| message: false#| warning: false#| code-fold: true#| code-summary: "Choropleth: trees per square mile (dark = high)"ggplot(dens_by_dist) +geom_sf(aes(fill = trees_per_mi2), color ="white", linewidth =0.35) +scale_fill_viridis_c(option ="mako", direction =-1, labels = scales::label_comma(),name ="Trees / mi²" ) +coord_sf(datum =NA) +labs(title ="Figure 2: Tree Density by Council District",subtitle ="Choropleth highlighting trees per square mile" ) +theme_minimal(base_size =12) +theme(panel.grid =element_blank(),legend.title =element_text(face ="bold") )
Figure 3 shows a hex-binned density map that aggregates individual tree points into hexagonal cells, approximately measuring 1,312 feet, or 400 meters, across. This cell size is appropriate for city-scale analysis, facilitating individual tree placement at a macro level while revealing neighborhood-level patterns at a micro level.
Unlike a choropleth, which maintains district boundaries, the hex grid is independent of political boundaries. Consequently, this difference reveals the true spatial distribution of trees. For example, the hex grid shows continuous bands of high density that cross district lines, leading to the inclusion of parks and street trees. The color gradient makes it easy to identify density “hotspots” and “coldspots” at a glance.
Show code
#| label: task3-hex#| fig.width: 10#| fig.height: 8#| code-fold: true#| code-summary: "Hex-binned density map (feet, dark = dense)"#| message: false#| warning: falsemake_hex_counts <-function(points_sf, districts_sf, cellsize_ft =1312){# Convert feet to meters for st_make_grid (uses meters internally) cellsize_m <- cellsize_ft *0.3048 bbox <-st_as_sfc(st_bbox(districts_sf)) grid <-st_make_grid(st_transform(bbox, 3857),cellsize = cellsize_m, what ="polygons", square =FALSE) grid <-st_transform(st_sf(hex_id =seq_along(grid), geometry = grid), 4326) idx <-st_within(points_sf, grid) tab <-as.data.frame(table(unlist(idx)), stringsAsFactors =FALSE)names(tab) <-c("hex_id","n"); tab$hex_id <-as.integer(tab$hex_id)left_join(grid, tab, by ="hex_id") |>mutate(n =coalesce(n, 0L))}# Create hexes at ~1,312 feet (~400m)hex_counts <-make_hex_counts(nyc_trees, nyc_districts, cellsize_ft =1312)p_hex <-ggplot() +geom_sf(data = hex_counts, aes(fill = n), color =NA) +scale_fill_viridis_c(option ="C",trans ="sqrt",direction =-1, labels =label_number(scale_cut =cut_short_scale()) ) +geom_sf(data = nyc_districts, fill =NA, color ="grey35", linewidth =0.25) +labs(title ="Figure 3: NYC Tree Density (hex-binned)", subtitle ="~1,312 ft hexes (≈400 m)", fill ="Trees" ) +theme_void() +theme(legend.position ="right")p_hex
Extra Credit Opportunity #01: Improved Tree Map Visualizations